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The complexity of interior point methods for solving discounted 

turn-based stochastic games 

Thomas Dueholm Hansen** Rasmus Ibsen- Jensen* 

Abstract 



o. 

We study the problem of solving discounted, two player, turn based, stochastic games (2TB- 
!-h ' SGs). Jurdzihski and Savani showed that in the case of deterministic games the problem can 

be reduced to solving P-matrix linear complementarity problems (LCPs). We show that the 
same reduction also works for general 2TBSGs. This implies that a number of interior point 
\^q . methods can be used to solve 2TBSGs. We consider two such algorithms: the unified interior 

point method of Kojima, Megiddo, Noma, and Yoshise, and the interior point potential reduc- 
tion algorithm of Kojima, Megiddo, and Ye. The algorithms run in time 0((1 + n)n 3 ' 5 L) and 
0(^-n 4 loge _1 ), respectively, when applied to an LCP defined by an n x n matrix M that can 
be described with L bits, and where the potential reduction algorithm returns an e-optimal 



c/3 ' solution. The parameters k, 5, and 9 depend on the matrix M. We show that for 2TBSGs with 

n states and discount factor 7 we get n = 8( (1 _" )2 ), —6 = 9(y^-), and 1/9 = 6( (1 _" s 2 ) in 
the worst case. The lower bounds for k, —S, and 1/9 are all obtained using the same family of 

deterministic games. 
■^ 
00 

00 ; 1 Introduction 



^+ \ Two-player turn-based stochastic games (2TBSGs). A two-player turn-based stochastic 

game (2TBSG) is a game played by two players (Player 1 and Player 2) on a finite state graph 
for an infinite number of rounds. The graph is partitioned into two sets of states S 1 (belonging to 
Player 1) and S 2 (belonging to Player 2). Whenever the current state i is from S , Player k chooses 
an action a emanating from state i, and the next state is then given by a probability distribution, 
depending on a. In each round there is a probability of 1 — 7 > of ending the game, where 7 is 
the discount factor of the game. Every action has an associated cost. The objective of Player 1 is 
to minimize the expected sum of costs, and the objective of Player 2 is to maximize the expected 
sum of costs, i.e., the game is a zero-sum game. Our results will be for the case when all states 
have 2 actions. 

The class of (turn-based) stochastic games was introduced by Shapley [19] in 1953, and it has 
received much attention over the following decades. For books on the subject see, e.g., Neyman 
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and Sorin [16] and Filar and Vrieze [5]. Shapley showed that states in such games have a value 
that can be enforced by both players (determinacy) . We will in this paper consider the problem of 
solving such games, that is, for each state i finding the value of that state. 

Classical algorithms for solving 2TBSGs. 2TBSGs form an intriguing class of games whose 
status in many ways resembles that of linear programming 40 years ago. They can be solved effi- 
ciently with strategy iteration algorithms, resembling the simplex method for linear programming, 
but no polynomial time algorithm is known. Strategy iteration algorithms were first described 
by Rao et al. [17] . Hansen, Miltersen, and Zwick [8] recently showed that the standard strategy 
iteration algorithm solves 2TBSGs with a fixed discount, 7, in strongly polynomial time. Prior to 
this result a polynomial bound by Littman [15] was known for the case when 7 is fixed. Littman 
showed that Shapley's [19] value iteration algorithm can be used to solve discounted 2TBSGs in 
time 0( "_ log jz~)i where n is the number of states, m is the number of actions, and L is the 
number of bits needed to represent the game. For a more thorough introduction to the background 
of the problem we refer to Hansen et al. |8j and the references therein. 

Interior point methods. One may hope that a polynomial time algorithm for solving 2TBSGs 
in the general case, when the discount factor 7 is not fixed (i.e., when it is given as part of the 
input), can be obtained through the use of interior point methods. This was also suggested by 
Jurdzihski and Savani [10] and Hansen et al. [8j. The first interior point method was introduced 
by Karmarkar [llj in 1984 to solve linear programs in polynomial time. Since then the technique 
has been studied extensively and applied in other contexts. See, e.g., Ye |20j . In particular, interior 
point methods can be used to solve P '-matrix linear complementarity problems, which, in turn, can 
be used to solve 2TBSGs. This will be the focus of the present paper. 

P-matrix linear complementarity problems (LCPs). A linear complementarity problem 
(LCP) is defined as follows: Given an (n x n)-matrix M and a vector q e M n , find two vectors 
w,z £ R n , such that w = q + Mz and w T z = and w,z > 0. LCPs have also received much 
attention. For books on the subject see, e.g., Cottle et al. [4] and Ye [20J. 

Jurdzihski and Savani (10] showed that solving a deterministic 2TBSG G, i.e., every action leads 
to a single state with probability 1, can be reduced to solving an LCP (M, q). Gartner and Rust 
[6] gave a similar reduction from simple stochastic games; a class of games that is polynomially 
equivalent to 2TBSGs (see [I]). Moreover, Jurdzihski and Savani [10] . and Gartner and Rust [6], 
showed that the resulting matrix M is a P -matrix (i.e., all principal sub-matrices have a positive 
determinant). We show that the reduction of Jurdzihski and Savani also works for general 2TBSGs, 
and that the resulting matrix M is again a P-matrix. 

Krishnamurthy et al. [2] recently gave a survey on various stochastic games and LCP formu- 
lations of those. 

The unified interior point method. There exist various interior point methods for solving P- 
matrix LCPs. One algorithm that we consider in this paper is the unified interior point method of 
Kojima, Megiddo, Noma, and Yoshise |12] , The unified interior point method solves an LCP whose 
matrix M S W nxn is a P*(K)-matrix in time 0((1 + K)n 3 ' 5 L), where L is the number of bits needed 
to describe M. A matrix M is a P* (K)-matrix, for k > 0, if and only if for all vectors x G M ra , we 
have that x T (Mx) + ^Y,ie&+{M) M Mx )i > °> where <M M ) = {« G N I x,(Mx) t > 0}. If M is a 
P-matrix then it is also a P* (K)-matrix for some n > 0. Hence, the algorithm can be used to solve 
2TBSGs. 

Following the work of Kojima et al. [12] . many algorithms with complexity polynomial in k, L, 
and n have been introduced. For recent examples see, e.g., [21 [21 [9]. 



An interior point potential reduction algorithm. The second interior point method that we 
consider in this paper is the potential reduction algorithm of Kojima, Megiddo, and Ye [13]. See 
also Ye [20]. The potential reduction algorithm is an interior point method that takes as input 
a P-matrix LCP and a parameter e > 0, and produces an approximate solution w, z, such that 
w T z < e, w = q + Mz, and w,z > 0. The running time of the algorithm is 0(^-n 4 log e _1 ), 

where 5 is the least eigenvalue of + , and 9 is the positive P-matrix number of M, that is, 
9 = mhi|| x || =1 max ie | lj n }Xj(Mx)j. We refer to ^- as the condition number of M. The analysis 
involving the condition number appears in Ye |20j. 

In his ph.d. thesis, Rust [H] shows that there exists a simple stochastic game for which the 
P-matrix LCPs resulting from the reduction of Gartner and Rust [6] has a large condition number. 
The example of Rust contains a parameter that can essentially be viewed as the discount factor 
7 for 2TBSGs, and he shows that the condition number can depend linearly on jzz- To be more 
precise, Rust [18] shows that the matrix M resulting from the reduction of Gartner and Rust [6] 
has positive P-matrix number smaller than 1, and that the smallest eigenvalue of the matrix M±M 
is — SI (j^ - ) - This bound can be viewed as a precursor for some of our results. 

1.1 Our contributions 

Our contributions are as follows. We show that the reduction by Jurdzihski and Savani [10J from 
deterministic 2TBSGs to P-matrix LCPs generalizes to 2TBSGs without modification. Although 
the reduction is the same we provide an alternative proof that the resulting matrix is a P-matrix. 
Furthermore, let G be any 2TBSG with n states and let Mq be the matrix obtained from the 
reduction of Jurdzihski and Savani |10j . 

(i) We show that Mq is a P<(K)-matrix for k = jjzr^i- This implies that the running time of 

the unified interior point method of Kojima et al. [12j for 2TBSGs is at most 0( ( -"_ w ). We 
also show that there exists a family of 2TBSGs, G n , such that the corresponding matrices, 
Mq u , are not P* (K)-matrices for k = Q( (1 ™ \z )- 



m g +mJ 



(ii) We show that the matrix — ^ — ~ nas smallest eigenvalue at least — O(-jtr-). We also show 

that there exists a family of 2TBSGs, G n , such that the corresponding matrices 
have smallest eigenvalue less than —£l(jzr-)- 



M Gn +M a 



(Hi) Finally, we show that the positive P-matrix number 9(Mq) is at least Q(- — :p-). We also 
show that there exists a familiy of 2TBSGs, G n , such that the corresponding matrices Mc n 
have positive P-matrix number, 9(Mc n ), at most 0( n )• 

Notice that (ii) and (Hi) together imply that the running time of the potential reduction 
algorithm of Kojima et al. p3] for 2TBSGs is at most O ^'^f 1 ). The familiy of 2TBSGs G n 
mentioned in (i), (ii), and (Hi) is, in fact, the same. Hence, we get matching upper and lower 
bounds for the parameters of both the unified interior point method and the potential reduction 
algorithm. Also, the games G n are deterministic, so the same lower bounds hold for deterministic 
2TBSGs. 



It should be noted that although our results for existing interior point methods for solving 
2TBSGs are negative, it is still possible that other (possibly new) interior point methods can solve 
2TBSGs efficiently. In fact, we believe that this remains an important question for future research. 

1.2 Overview 

In Section we formally introduce the various classes of problems under consideration. More 
precisely, in Subsection 12.11 we define LCPs, and in Subsection 12.21 we define 2TBSGs. In Sub- 
section 12.31 we show that the reduction by Jurdzihski and Savani [10] from deterministic 2TBSGs 
to P-matrix LCPs generalizes to general 2TBSGs. In Section [3] we estimate the k for which the 
matrices of 2TBSGs are P*(K)-matrices, thus, giving a bound on the running time of the unified 
interior point method of Kojima et al. |12| . In Section [J] we bound the smallest eigenvalue and 
the positive P-matrix number, thus giving a bound on the running time of the potential reduction 
algorithm of Kojima et al. |13j . 

2 Preliminaries 

2.1 Linear complementarity problems 

Definition 1 (Linear complementarity problems) A linear complementarity problem (LCP) 
is a pair (M, q), where M is an (n x n) -matrix and q is an n-vector. A solution to the LCP (M, q) 
is a pair of vectors (w, z) £ M. n such that: 
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+ Mz 


w z 


= 







w,z 


> 








We will now define various types of matrices for which interior point methods are known to 
solve the corresponding LCPs. 

Definition 2 (P-matrix) A matrix M £ M. nxn is a P-matrix if and only if all principal sub- 
matrices have a positive determinant. 

The following lemma gives an alternative definition of P-matrices (see, e.g., [U Theorem 3.3.4]). 

Lemma 3 A matrix M £ R nxn is a P-matrix if and only if for all n-vectors x/0 there is an 
i £ [n] such that Xj(Mx)j > 0. 

Definition 4 (Positive P-matrix number) For a matrix M £ M. nxn , the positive P-matrix 
number is 

9(M) = min max Xj(Mx)j . 

||x|| 2 =l ie[n] 

Note that, according to Lemma [31 9{M) > if and only if M is a P-matrix. 



Definition 5 (P*(K)-matrix) A matrix M G R nxn is a P*(K)-matrix, for n > 0, if and only if 
for all vectors x G W 1 : 

J2 Xi(Mx)< + (1 + 4k) J2 MM*)i > , 
ie<5_(M) ie<5 + (M) 

w/iere <5_(M) ={ie [n] | x^Mx), < 0} and 8+(M) = {i G [n] | x^Afx), > 0}. H^e say that M is 
a P^-matrix if and only if it is a P*(k) -matrix for some k > 0. 

Kojima et al. [12] showed that every P-matrix is also a P„-matrix. By definition, a matrix M is 
a Pf(0)-matrix if and only if it is positive semi- definite. The set of symmetric P-matrices is exactly 
the set of positive semi-definite matrices. 

2.2 Two-player turn-based stochastic games 

Definition 6 (Two-player turn-based stochastic games) A two-player turn-based stochastic 
game (2TBSG) is a tuple, G = (S 1 ,S 2 ,(Ai) i( z S i uS 2,p,c,j), where 

• S k , for k G {1, 2}, is the set of states belonging to Player k. We let S = S 1 U S 2 be the set of 
all states, and we assume that S 1 and S 2 are disjoint. 

• Ai, for i e S, is the set of actions applicable from state i. We let A = Uies^i be the set of 
all actions. We assume that Ai and Aj are disjoint for i ^ j, and that Ai^% for all i £ S. 

• p : A — t- A (5) is a map from actions to probability distributions over states. 

• c : A — > R is a function that assigns a cost to every action. 

• 7 < 1 is a (positive) discount factor. 

We let n = \S\ and m = \A\. Furthermore, we let A = [j ieSk Ai, for k G {1,2}. Figure Q] 
shows an example of a simple 2TBSG. The large circles and squares represent the states controlled 
by Player 1 and 2, respectively. The edges leaving the states represent actions. The cost of an 
action is shown inside the corresponding diamond shaped square, and the probability distribution 
associated with the action is shown by labels on the edges leaving the diamond shaped square. 

We say that an action a is deterministic if it moves to a single state with probability 1, i.e., 
if p(a)j = 1 for some j G S. If all the actions of a 2TBSG G are deterministic we say that G is 
deterministic. 

Plays and outcomes. A 2TBSG is played as follows. At the beginning of a play a pebble is 
placed on some state io G S. Whenever the pebble is moved to a state i G S k , Player k chooses an 
action a G Ai and the pebble is moved at random according to the probability distribution p{a) to 
a new state j. Let a* be the i'th chosen action for every t > 0. Then the outcome of the play, paid 
by Player 1 to Player 2 is X^>oT* ' ^a 1 ). 

We will now give a way to explicitly represent a 2TBSG using vectors and matrices. It will later 
simplify the notation in our constructions and proofs. Figure [1] also shows such a representation of 
a 2TBSG. 

Definition 7 (Matrix representation) Let G = (S 1 , S 2 , (Ai) i( z S i uS 2,p, c, 7) be a 2TBSG. As- 
sume WLOG that S = [n] = {1, . . . , n} and A = [m] = {1, . . . , m}. 



• We define the probability matrix P £ ]j mxn by p ai = (p(a))i, for all a £ A and i £ S. 

• We define the cost vector c £ W 71 by c a = c(a), for all a £ A. 

• We define the source matrix J E {0, l} mxn by J ai = \ %f and only if a £ A4, for all a £ A 
and i £ S. 

• We define the ownership matrix X £ {—1,0, l} nxn by Tij = if i 7^ j, In = — 1 if i £ S , 
and Xii = 1 if i £ S 2 . 

Note that P a ^ is the probability of moving to state i when using action a. For a matrix 
M £ R mxn and a subset of indices B C [m], we let Mb be the sub-matrix of M consisting of rows 
with indices in B. Also, for any i £ [m], we let Mj E M lxn be the i-th row of M. We use similar 
notation for vectors. 

Definition 8 (Strategies and strategy profiles) A strategy a k : S k — > A k for Player k £ 
{1,2} maps every state i £ S k to an action a k (i) £ A{ applicable from state i. A strategy profile 
a = (a , a 2 ) is a pair of strategies, one for each player. We let £ be the set of strategies for Player 
k, and E = S 1 x S 2 be the set of strategy profiles. 

We view a strategy profile a = (a 1 , a 2 ) as a map a : S — > A from states to actions, such that 
a(i) = a k (i) for all i £ S k and k £ {1, 2}. 

A strategy a k £ T, k can be viewed as a subset a k C A k of actions such that a k n Ai = {o~ k (i)} 
for all i £ S . A strategy profile a = (a 1 , a 2 ) £ £ can be viewed similarly as a subset of actions 
a = a 1 U <7 2 C A. Note that P a is an n x n matrix for every a £ S. We assume WLOG that 
actions are ordered such that J a = I, where I is the identity matrix, for all a £ X. Figure [T] shows 
a strategy profile a represented by bold gray edges, the corresponding matrix P a , and the vector 
c a - 

The matrix P a defines a Markov chain. In particular, the probability of being in the j-th state 
after t steps when starting in state i is (P*)ij. In Figure [1] such probabilities are shown in the table 
in the lower right corner, where ei is the first unit vector. We say that the players play according to 
a if whenever the pebble is on state i £ S k , Player k uses action a(i). Let i £ S be some state and 
t some number. The expected cost of the i-th action used is (P*)jC CT . In particular, the expected 
outcome is X/^oT*(^ff)* Co '' ^he following lemma shows that this infinite series always converges. 

Lemma 9 For every strategy profile a £ £ the matrix (I — ^P a ) is non- singular, and 

00 

t=0 

The simple proof of Lemma [9] has been omitted. For details we refer to, e.g., [7]. 

Definition 10 (Value vectors) For every strategy profile a £ E we define the value vector v°" E 

R n by: 

V CT = (I-jPa)- 1 ^ . 
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Figure 1: Example of a simple 2TBSG and a strategy profile a 
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The i-th. component of the value vector v°", for a given strategy profile a, is the expected 
outcome over plays starting in i £ S, when the players play according to a. 

It follows from Lemma [9] and Definition [10] that v CT is the unique solution to: 

v ff = c ff + 7 P ff v CT . (1) 

Definition 11 (Lower and upper values) We define the lower value vector v £ M. n and upper 
value vector v £ M. n by: 

Mi £ b : v.- = mm max v; 

vi £ b : v,- = max mm v.- 
CT 2 es2(T i eE i * 

Shapley |19j showed that v = v. Hence, we may define the optimal value vector as v* := v = v. 

Definition 12 (Optimal strategies) A strategy a 1 S S 1 is optimal if and only if: 

Mi £ b : max v„- = v,- . 

a 2 es 2 * 

Similarly, a strategy a 2 £ X 2 is optimal i/ and only if: 

Mi £ b : mm v„- = v.- . 

We say that a strategy profile a = (a 1 , a 2 ) £ E is optimal if and only if a 1 and a 2 are optimal. 

Note that an optimal strategy for Player 1 (Player 2) minimizes (maximizes) the values of all 
states simultaneously. Hence, it is not immediately clear that optimal strategies always exist. This 
was shown by Shapley [19] . however. Solving a 2TBSG means finding an optimal strategy profile 
(or equivalently the optimal value vector). 

Definition 13 (Reduced costs) For every strategy profile a E £ we define the vector of reduced 

costs c CT £ R m by: 

Mi£S,a£Ai-. <? a = c a + 7 P a v CT - vf . 

The following theorem establishes a connection between optimal strategies and reduced costs. 
For details see, e.g., 0E]. 

Theorem 14 (Optimality condition) A strategy profile a £ E is optimal if and only if (c a )^i > 
and (c' 7 )^ < 0. 

2.3 LCPs for solving 2TBSGs 

Jurdzihski and Savani [10J showed how the problem of solving deterministic 2TBSGs can be reduced 
to the problem of solving P-matrix LCPs. We next show that the same reduction works for general 
2TBSGs. 

Throughout this section we let G = (S 1 ,S 2 ,(Ai)i^s,P,c,^/) be some 2TBSG and (P, c, J,X, 7) 
be the corresponding matrix representation. We assume that there are exactly two actions available 



from every state, i.e., \Aj\ =2 for all i G S. We partition A into two disjoint strategy profiles a 
and r. 

An LCP for solving G can be derived as follows. Consider the following system of linear 
equations and inequalities, where w,y, z G R n are variables. 

(/- 7 P ff )y-lw = c CT (2) 

(I- 7 P T )y-Xz = c r (3) 

w T z = (4) 

w,z > (5) 

Lemma 15 If w,y,z G R n is a solution to (dp, |3)], Qj, and (E)], then y is the optimal value 
vector, a strategy profile it is optimal if it C {a(i) \ i £ [n] A Wj = 0} U {r(i) | i G [n] A Zj = 0}, and 
such a strategy profile exists. 

Proof Let B = {a(i) \ i G [n] A Wj = 0} U {r(i) | i G [n] A z, = 0}, and let IT be the set of all 
strategy profiles contained in B. Since w and z satisfy (J3J, we know that II ^ 0. 

Let a £ B Ci Ai for some z 6 S. It follows from ([2]) and ([3J) that y^ — 7-P a y = c a . Hence, we get 
from Equation (pQ) that y = v 71 ", for every 7r G II. Combining this with ([2]), ([3|), and ([5]) we get that: 

VieS 1 ,aeA i : v f- 7 P a v 7r < c a 

V^S 2 ,ae^: v?-7P a v w > c a 

It follows from Definition [13] and Theorem Q31 that tt is an optimal strategy profile. □ 

We know from Lemma[9]that (/— 7 P r ) is non-singular. Hence, (J3j) can be equivalently expressed 
as: 

y = (/- 7 P T )- 1 (c r +Xz) . 

Eliminating y in ([2]) we then get the following equivalent equation: 

(/- 7 P T )(/- 7 P T )- 1 (c T +Xz)-Xw = c CT <=► 

X(J - 7 P CT )(/ - iP T )- l lz - w = Xc a - 1(1 - 7 P CT )(/ - 7 P T )- 1 c r <=► 

w = (X(/- 7 P T )(/- 7 P r )" 1 c r -Xc (T )+X(/- 7 P T )(/- 7 P r )- 1 Xz (6) 

To simplify equation (|6|) we make the following definition. 

Definition 16 (Mg- jCT]T and q^^,-) Let G be a 2TBSG with matrix representation (P, c, J,X, 7 ), 
and let the set of actions of G be partitioned into two disjoint strategy profiles a and r. We define 
M G ^ T G M" xn and q G , ff)T G M n 6y: 

M G)<TiT = 1(I- 7 P (T )(I- 1 P T )- 1 1 

<lG,a,r = X(/- 7 P T )(/- 7 P r ) _1 C T -XC CT . 

I.e., equation ([6]) can be stated as w = qG,o-,r + Mg,(t,t^- It follows that ([2]), ©, (jl|), and ([5]) can 
be equivalently stated as y = (/ — 7 P T ) _1 (c T +Xz) and: 
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Hence, a solution to the LCP (Mg )(T)T , C[G,a,r) gives a solution to ([2j), ©, (J3J), and ([5]), which, using 
Lemma [T5j solves the 2TBSG G. We say that (Mg >(T>t , q^.o-.r) solves G. 

Jurdzihski and Savani |10] showed that Mq i(T ^ is a P-matrix when G is deterministic. To prove 
the same for general 2TBSGs we introduce the following lemma. The lemma is also used in the later 
parts of the paper. To understand the use of v in the lemma observe that x T (/— ^P a ){I— 7P r ) _1 x = 
x T (J- 7 P ff )v. 

Lemma 17 Let x be a non-zero vector, v = ( I — 7P T )~ 1 x J and j G argmaxj |vj|. Then we have 
that: 

W > (1-7)W . (7) 

V*:|xi| < (l + 7)|vil • (8) 

x i ((/-7P CT )(/- 7 P r )" 1 x) J > (1-7)1x^-1 > . (9) 

Proof Observe first that v is the unique solution to v = x + jP T v. In fact, we can interpret v 
as the value vector for r when the costs c T have been replaced by x. If v = then this implies 
that = x + / which is a contradiction. Thus, v/0 and in particular Vj 7^ 0. Since, for 
every i, the entries of (P T )i are non-negative and sum to one we have that |7(P t )jv| < 7 |vj|. The 
equations Vj = Xj + 7(P T )jV, for all i, then imply that: 

l x il = l v i -7(-Pt)jv| > |vj| - |7(P T )jv| > |vj*| — |7Vj| = (l~7)l v il > and 
Vz : |xj| = |vj-7(P t )jv| < |vi| + |7(P r )jv| < |vj| + I'yvj- 1 = (l + 7)l v jl • 

This proves ([7]) and ©. 

We next observe that Vj and Xj have the same sign. This again follows from |7(ir)j v | < 7 l v jl 
and Vj = Xj + 7(P t )jv . Using that sgn(vj) = sgn(xj) we can now see that: 

x i ((/-7P T )(/- 7 P r )- 1 x) i = x i ((/- 7 P T )v) i = x i v i - 7 x i (P ff ) i v 

> x jVj - 7x jVi = (1 - 7)x j v j > . 

This proves ([9]). D 

We know from Lemma [3] that the matrix Mg,&,t is a P-matrix if and only if for every x/0 
there exists a j G [n] such that Xj (M(j )(T)T x) j > 0. Since Xx / 0, inequality ([9]) in Lemma [T71 shows 
that Xj (Mg )CT)T x) j > for j G argmax^ ((I — jP r ) Tx)A. Hence, Mq i(TjT is a P-matrix. 

We summarize the results of this section in the following theorem. 

Theorem 18 Let G be a 2TBSG, and let a and r be two disjoint strategy profiles that form a 
partition of the set of actions ofG. Then the optimal value vector for G bv* = (J— / yP T )~ 1 (c T +Iz), 
where (w,z) is a solution to the LCP (Mq^ t , q^ayr)- Furthermore, Mg )(T ,t is a P-matrix. 

Recall that Kojima et al. [12J showed that every P-matrix is a P,-matrix. Hence, we have shown 
that Mq ot is a P„-matrix. 
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3 The P*(«) property for 2TBSGs 

Let G be a 2TBSG with matrix representation (P, c, J, X, 7), and let a and r be two disjoint strategy 
profiles that form a partition of the set of actions of G. Recall that G can be solved by solving the 
LCP {Mg,(j,ti <1g,o-,t)- I n this section we provide essentially tight upper and lower bounds on the 
smallest number k for which the matrix Mg 7< t,t 1S guaranteed to be a P*(K)-matrix. More precisely, 
we first show that for k = (1 _" ^ , the matrix Mg,o,t ls always a P* («)-matrix. We then also show 
that for every n > 2 and 7 < 1 there exists a game G n , and two strategy profiles a n and r n , such 
that MG n ,ff n ,Tn 1S n °t a P*(ft)-matrix for any k < g (1 _ J — |. It follows that the unified interior 

point method of Kojima et al. [12] solves the 2TBSG G in time 0( ,^_°h ), where L is the number 
of bits required to describe G, and that this bound can not be improved further only by bounding 

K. 

Recall that M G ^ T = X{I--iP a ){I-- i P T )~ 1 X, and define M := XM G ^ T X = (I-~fP a )(I-~ f P T )- 1 . 
It is easy to see that Mq a,T 1S a P*(K)-matrix for some k > if and only if M is. Indeed, the 
inequality of Definition [5] must hold for all x £ M. n , and we can therefore substitute x by Ix. Hence, 
for the remainder of this section we will bound the k for which M is a P* (K)-matrix. 

Theorem 19 Let n and < 7 < 1 be given. For any 7 -discounted 2TBSG G with n states, the 
matrix Mq^ t , where a and r partition the actions of G, is a P^(n)-matrix for k = (1 ^ y 2 ■ 

Proof As discussed above we may prove the theorem by bounding k for M = (I — r yP (7 )(I — 7P r )~ 1 
instead of for Mq a<T . Thus, we need to find a number k, such that 

Vxel": Y, x i ((/-7P T )(/- 7 P r )- 1 x) i + (1 + 4k) J^ Xi ((/- 7 P (T )(/ - 7 P T )- 1 x) l > , 

iS(5_(Af) ieS + (M) 

where S-(M) = {i £ [n] | x,(Mx) t < 0} and 5+{M) = {i G [n] | XjfMx), > 0}. 

Let x be any non-zero vector (the expression is trivially satisfied for x = 0), v = (/ — 7P r )~ 1 x, 

and j G argmaxjvj|. To prove the lemma we will estimate Ylie5-(M) x «((^ ~~ 7Pr) v )i an d 

EieMM) X *^ J ~ ^ P ^) v )» separately 
Using Lemma [171 we see that: 

V»: |x i ((/- 7 P (J )v) l | < |xi| (|vj| +T|vj|) < (l + 7) 2 |vjf < 4|v/ , 

which implies that: 

Y, x i ((/- 7 P (T )v) l < 4n|v/ . 

ie<5_ (M) 
Similiarly, from Lemma [UJ we have that: 

XjfMxJj > (l- 7 )x iVi = (l-7)W|Vj| > (l-7) 2 |vi! 2 , 
which implies that: 



^ x i ((/- 7 P CT )v) i > (l- 7 ) 2 |v, 

iG5+(M) 



2 
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Figure 2: An example game G n and two strategy profiles a n (solid) and r n (dashed), where 
MG n ,a„,r n essentially matches the upper bound given in Theorem [T9l 



2|...|2 



We conclude that: 
J2 x i ((/- 7 P CT )v) i + (1 + 4/t) J^ x i ((I- 7 P (r )v) i > -4n| Vi | 2 + (l + 4 K )(l-7) 2 |v 

ie<5_(A/) ieS + (M) 

It follows that M is a P* (/-^-matrix when: 

-4n|v i | 2 + (l + 4«;)(l-7) 2 |vj| 2 > 



4K(l- 7 ) 2 | Vi | 2 > (4n-(l-7) 2 )| Vj - 12 



K > 



n 1 



(I-7) 2 4 ' 



□ 



We next present a lower bound that essentially matches the upper bound given in Theorem [T9] 
The gap between the upper and lower bounds is close to a factor of 8 for 7 going to 1. Note that we 
are mostly interested in the case when 7 is very close to 1, since it is known that the problem can 
be solved in strongly polynomial time when 7 is a fixed constant [8]. We establish the lower bound 
using the family of games {G n \ n > 2} shown in Figure [2j Figure [2] also shows two strategies a n 
and T n shown as solid and dashed arrows, respectively. Formally, the games are defined as follows. 
The game G n . For a given n, let G n be the following game containing n states, all belonging to 
Player 2. For i < n — 2, state i has two actions: one leading to state n — 1 and one leading to state 
n. State n — 1 and n have two selfdoops each. I.e., the game is deterministic. The cost vector c can 
be arbitrary, and the discount factor 7 will be specified by the analysis. We also define two disjoint 
strategy profiles a n and r n that partition the set of actions. The strategy profile a n contains for all 
states i < n — 2 the action leading to state n — 1, and the strategy profile r n contains for all states 
% < n — 2 the action leading to state n. Furthermore, at states n — 1 and n, the strategy profiles 
a n and r n contain a self-loop each. 



Theorem 20 Let n > 2 and < 7 < 1 be given. For the 2TBSG G n , the matrix M Gn 



is not 



a P*(k) -matrix, for k < Lj" J 



m 



8(l- 7 ) 2 4 — "V (I-7) 2 )■ 

Proof Notice first that since all states belong to the Player 2, X is the identity matrix. Thus 
M = MG n ,a n ,T n - We then need to find a number k > 0, such that: 

Vx: ]T x,((/- 7 P CT )(/-7P r r 1 x) i + (l + 4K) J^ x l ((/- 7 P (J )(/-7P T )- 1 x) l >0 . 
i£<5_(M) ie5 + (M) 
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Let x £ W 1 be defined by, for all i £ [n]: 



'l-T^- if i < n - 2 

1-7 — 

Xj = < 1 if i = n — 1 

— 1 if i = n . 

V 

Let v = (I — 7P T )~ 1 x be the value vector for r when using costs x. By straightforward calculation, 
using Equation ([1]), we see that 

fO if i<n- 2 
v,- = < 1 — if i = n — 1 

' 1— 7 

t 1 — if i = n . 

1-7 

Also, let rj = Xj((J — jP a )v)i = XjVj — 7Xj(P <t v)j. Again, by straightforward calculations we 
see that: 



Vi < n — 2 : r, 



r n -i 



-7(1 

1 



1 



1 



1 — 7/ 1 — 7 
1 



7 



1-7 



7: 



1 — 7 1 — 7 

1 1 

i 7^ 

1 — 7 1 — 7 



Therefore we have that: 



and that: 



J2 x 4 ((/- 7 P (T )v) J = -(n-2)(-J-) 2 , 
t-^ 1 — 7 

J6<5_(A<f) ' 

J2 x i ((J- 7 P« r )v) i = 2 . 

ie5+(M) 



Hence, for ^ fe(5 _( M ) x;((I - 7F (7 )v) ! + (1 + 4ac) ^ i£(5 (M) Xj((I - 7P T )v) i > to be true we need 



that: 



(n-2) 



7 



1-7 



< (1 + 4k)2 



k > 



n — 2 / 7 



1-7 



D 



4 Bounds for the potential reduction algorithm 

The interior point potential reduction algorithm of Kojima et al. [13] for solving a P-matrix LCP 
(M, q) takes as input a parameter e > and produces a feasible solution (w, z) for which w T z < e. 
Following Ye [20] . the running time of the potential reduction algorithm is upper bounded by 
0( = w-n loge -1 ), where (i) 5 is the smallest eigenvalue of — ^ — ; and (ii) is the positive P-matrix 
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number of M (Definition H]) . In this section we are interested in bounding the running time of the 
potential reduction algorithm when applied to 2TBSGs by studying the two quantities (i) 6 and 

(it) e. 

Throughout the section we let G be a 2TBSG with matrix representation (P, c, J,Z, 7), and a 
and r be two disjoint strategy profiles that form a partition of the set of actions of G. To simplify 
notation we let M := Mq >(TiT , To bound the running time of the potential reduction algorithm we 
need to bound the smallest eigenvalue 5 of — ^ — and the positive P-matrix number 9{M) of M. 



We study the smallest eigenvalue of — + 2 - in Section T4. 11 and the positive P-matrix number 9{M) 
in Section [4.2i For both quantities we provide upper and lower bounds that are essentially tight. 

More precisely, we show that we always have smallest eigenvalue 5 > — 1 _ , and that 
there exists a family of 2TBSGs G n , with corresponding strategy profiles a n and r n , for which the 

smallest eigenvalue is at most 1 JL- — -. I.e., the gap is only a factor of 2v2 for 7 going to 1. 

We also show that we always have 9{M) > ifi—yi^, and that there exists a family of 2TBSGs G n , 
with corresponding strategy profiles a n and r n , for which the positive P-matrix number satisfies 

0(MG n , ffn ,T n ) < <J^(n-2) ■ Ie > the bound is tight when 7 goes to 1. 

It is important to note that the upper bound for the smallest eigenvalue 5 and the upper bound 
for the positive P-matrix number #(-M~G n ,<r„,T n ) are obtained using the same game G n and the same 
strategy profiles a n and r n . In fact, for both bounds we use the same game and strategy profile as 
were used in the proof of Theorem 1201 I.e., G n , a n , and r n are shown in Figure [2j Hence, for this 
particular game we achieve the worst-case ratio of ^- = ^( /7" <& )■ 

4.1 Bounds for the smallest eigenvalue 

We first lower bound the smallest eigenvalue of — -^ — , where M = Z(I — ^P a )(I — 7P r )~ 1 Z. We 
let Rj?,| _ k be the set of vectors in W 1 such that each vector v G Rj?,, _ k has ||v||j = k. 

Theorem 21 The matrix + 2 has smallest eigenvalue greater than — i_^ n = —0(j-^-) 

Proof Look at the equation Ax = M + M x, where A is the smallest eigenvalue. We have that: 

M + M T 

Ax = x 

2 

= 1(1 - 7 P CT )(/ - -yPr)- 1 ! + 1{I - iP?Y l (I - JPJ)I 

2 

= x (i- ip.W - iPtY 1 + (/ - jPjrHj - Tg) Xx 

2 
By letting y = Xx we obtain the equation: 

A _ (i- ip«)(.i - iPr)- 1 + (/ - iPjrHi - iPj) 
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We can without loss of generality assume that y has two-norm equal to one, and by the triangle 
inequality we therefore have: 

(i - 7 p a )(i - jPt)- 1 + (i- jpjrni - 7 pj), 



|A| = ||Ay|| 2 

<\\\(I - lP.)(I - iPrrWlz + lp - iPj)-\l - iPj)y 
We will bound \ \\(I - ~jP a ){I - 7-Pr) _1 y|| 2 and \ IK J ~ l P r)~ l {I ~ l p 2)y\\ 2 separately. We 



first observe that: 



|(J- 7 P (r )(J-7P T )- 1 y|| 2 < max II (I - 7 P ff )(I - 7 P T )- 1 v| 



ven 



max 

vGM." „ 



< max 



max 



i=0 
oo 

(/-7^)j^7*v 

U-7^) 



i=0 
V 



1-7 



< 



max 



= 1+7 



1-7 



1 



max 



1 --yve^„ oo=1 ^ 
_ (i + tVH 

1-7 

Here, the first inequality comes from the fact that if a vector v 6 M n has two-norm equal to 1, then 
it has infinity-norm equal to at most 1. The first equality follows from Lemma To prove the 
second inequality we use that ||-Pr v || ^ ll v lloo' f° r an ^ > 0, since the entries of P T are in [0,1]. 
The third inequality follows from the fact that \\(I — 7Pr)v|| 00 < (1 + 7) HvH^. The last equality 
comes from the fact that if a vector v £ R n has infinity-norm equal to 1 + 7 then it has two-norm 
at most (1 + 7)y / n. 
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We also have that: 

(/- 7 P T T )- 1 (/- 7j P CT T )y 



< max 

2 vGK? i. 



(/-7P r T )- 1 (/-7^)v^v 

< (l+ 7 )v^ max ||(/ - jPj)-\ 

vGR|f „ 

Mll = 

= (l + 7)i/n max 



< (1 + 7)\/n max 
ll'lll- 



n max 
veM™ „ 
ll- llx= 



oo 




E^) T v 


4=0 


oo 




EA 




t=0 


2 



1-7 



max 



= (1 + 

= {1+jWn 

1 — 7 vel 

_ (l + 7K/n 

1-7 

Here, the first inequality comes from the fact that if a vector v G M. n has two-norm equal to 1 
then it has one-norm equal to at most y/n. The second inequality follows from the fact that the 
columns of Pj sum to 1 such that (I — 7Pj")v|L < (1 + 7) Hv^. The first equality follows from 
Lemma [9j For the third inequality we again use that the columns of Pj sum to 1, which implies 
that ll-Pjvllj 

equal to 1 then it has two- norm at most 1. 
Hence, 



I v|| 1 . The last equality comes from the fact that if a vector v e M n has one-norm 



completing the proof. 



lAI < I(l±7Vn 1(1 + 7V^ _ (l + 7)y^ 
1 ' ~ 2 I-7 2 I-7 I-7 



M G n ^n,T„+MG„, t T n ,rn 



We will now upper bound the smallest eigenvalue of 
were defined in Section [3] (Figure [2]) . 

Theorem 22 Let n and < 7 < 1 be given, and let M := MQ n ^ Uri ^ Tn . 

smallest eiqenvalue at most 1 ¥- r- 

V2(l-7) 

Proof Let x £ i" be defined by, for all i G [n]: 

a if i < n — 1 

1 if i = n — 1 
— 1 if i = n , 

where a is a parameter that will be specified later. 



□ 



, where G n , a n , and r„ 



The matrix "t, has 
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We will show that there is an a such that x is an eigenvector with eigenvalue 2 
Hence, we look at the equation 



7 y/2(»-2) 



1-7 



M + M T 

Ax = x 

2 

_ 1(1 - 7 P a )(I - 7 P T )- 1 X + X(/ - jPjyHl - 7^)? 

2 

Notice that I is the identity matrix. We will look at each term separately. In each case we will 
evaluate the expression from right to left. 

We first evaluate (I — ^P a )(I — 7-P T ) _1 x. Let v = (J — 7-P r )~ 1 x. I.e., v is the value vector of r 
then the costs are replaced by x. Then, by simple calculation using Equation dTJ, we see that: 



V,- 



a + 



7 



if i < n - 2 



1-7 

--, — if i = n — 1 

1-7 

-P— if i = n . 

1-7 



Let r = (/ — 7-Po-)v = (I — jP a )(I — jP T ) x x. Then, by simple calculations, we see that: 



a + 



27 



if i < n - 2 



1-7 

1 if i = n — 1 



-1 



if i = n 



Next, we evaluate (I — ^/Pj)~ 1 (I — 7-P ( J)x. Let x' = (/ — 'yPj). Then it is easy to see that for 
all i G [n]: 

a if i < n — 2 

1 — 7 — 70(71 — 2) if i = n — 1 
— 1 + 7 if i = n . 

Let r' = (I — / yPj)~ 1 ~x' . Note that no actions move to state i for i < n — 2, meaning that [Pj)i = 0. 
Then by multiplying from the right with (/ — jPj) we see that: 



X; 



Mi < n - 2 : r • = x • + j(P T );r' = x- 



L n-1 



Xn-1 + 7(^7)n-ir' = X / „_ 1 +7r; t _ 1 



n-2 



r n = x 'n + l( p J)nr' = x' n + jr' n + 7 ^ r • 



i=l 



1 — 7 — 7«(^ — 2) 

-l + 7 + 7a(n-2) 

1-7 



Therefore, if we let: 



v / _ (-^ ~ 7^)U ~ T^)- 1 + (J - 7^r T )- 1 (J ~ 7^ T ) x 



we have that: 



a + 



1 



1-7 

1 7a(n-2) 



if i < n — 1 

if i = n — 1 



7 n(n-2) : f • _ 
■- 1 + 2(1-7) ' 
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We want Ax = v', which is then the same as the equation system: 

7 



Xa = a + 
A = l- 



1-7 
7a(re — 2) 

2(1-7) 



By eliminating A we get: 



7 7a 2 (n — 2) 

1-7 2(1-7) 

7 _ 7Q 2 (^-2) 

1-7 ' 2(1-7) 

2 = a 2 (n-2) =>- 






n-2 
Since we want to minimize A over a we get: 



A = i _7v>^2y 

\/2(l-7) 
Hence, the matrix — t — has smallest eigenvalue at most 1 %-■. -. D 

z V2(l— 7) 

4.2 Bounds for the positive P-matrix number 

We will now lower bound the positive P-matrix number for any 2TBSG. 

Theorem 23 Let n and < 7 < 1 be given. For any 2TBSG G with n states, the matrix 
M := Mg,o;t, where a and r partition the states of G, has positive P-matrix number, 8{M), at 
l east J£$l = n(H^) 



Proof Recall that the positive P-matrix number of M = Mg )CT ,t is defined as: 

6{M) = min max Xj(Mx)^ . 

||x|| 2 =l i£[n] 

Let x G R?m| =1 be given. Let v = (I— 7P T ) _1 x and j G argmax i | v$|. From Lemma[T7]we know 

2 
that Xj(Mx)j > (1 — 7) |xjVj| > (1 — 7) 2 (vj) 2 . We also know from Lemma [T71 that v 2 > , ' y 2 for 

all i G [n]. Hence, we see that Xj(Mx)j > ^7^ for all i G [n]. Since ||x|| 2 = 1 there must exist 



an index i such that |*| > ^. It follows that x,(Mx), > $g£. Since this inequality holds for 
aJlxeR]?,, 1 we see that fl(M) > / 1 1 ~ 7 A 2 . 

|H| 2 =1 V I — (1+jYn 

D 

We will now upper bound the positive P-matrix number of MQ n ^ n)Tn . I.e., we once again use 
the construction from Figure [2j 
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Theorem 24 Let n and < 7 < 1 be given. The matrix M := MG ntUntTn has positive P-matrix 
number 8(M) < ^{1-2) ■ 

Proof Recall that the positive P-matrix number of M = Mc njCrn)Tn is defined as: 

8(M) = min max Xj(Mx)j . 

||x|| 2 =l ie[n] 

In the following we consider a concrete vector x £ R n defined by, for all i €. [n]: 



x. 



1 if i < n - 2 
—a if i = n — 1 
a if z = n , 



where a is a parameter that we specify later. We later normalize x, such that ||x|| 2 = 1. I.e., we 
want to evaluate the following expression for all i £ [n]: 

Xj(Mx), ___ x l (X(/-7P CT )(I- 7 P T )- 1 Xx) J 



x 



X 



Recall that I is the identity matrix. We will evaluate the expression from right to left. 

Let v = (I — 7P r )~ 1 x. Then, by simple calculations using that v = x + 7P7-V, we see that: 

( l _ «3L if i < n _ 2 

1-7 - 



— a 
1-7 
a 
V. 1— 7 



if i = n — 1 
if z = n . 



Let r = (I — jP a )v. Then, by simple calculations, we see that: 



1 _ 2«7. if j < n _ 2 

1-7 — 

—a if z = n — 1 

a if i = n . 



Hence, XjTj is: 



Xy'r; 



f 1-2231 ifi< n -2 

1-7 — 

a 2 



a 



if i = n — 1 

if i = n 



We now let a = -y- 5 -, in which case we see that: 



JO if i < n - 2 

XiFi = <^ (1 _ 7) 2 



(2t7 



if i = n — 1 oi i = n 



We also see that ||x|| 2 = n — 2 + 2a > n — 2. It follows that the positive P-matrix number, 9(M), 

is at most: 

X,;r ? ; (1 - 7) 2 



4n 
max o < 

• j~ r l i i II-" 



(2 7 ) 2 (n-2)- 



D 
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